PHASE 2A VALIDATION GUIDE
=========================

Purpose
-------
This file defines the repeatable validation process for the Phase 2A Letter Tally Pipeline.

Core Reproducibility Contract
----------------------------
1) Source inputs are normalized Quran files matching *_normalized.csv or *_normalized.xlsx.
2) CSV inputs are preferred when CSV and XLSX share the same source stem.
3) Input discovery is recursive by default.
4) The pipeline generates per-verse 28-letter tally matrices.
5) A successful reproducibility run requires repeated rebuilds to produce identical tracked text outputs.

Preconditions
-------------
1) Python virtual environment exists at .\.venv
2) Dependencies are installed from requirements.txt
3) Normalized source files are present somewhere inside the workspace.
4) Scripts are run from the workspace root.

Validation Steps
----------------
1) Run the full pipeline:

   python run_pipeline.py --clean

2) Run reproducibility verification:

   PowerShell -ExecutionPolicy Bypass -File .\verify_repro.ps1

Expected Results
----------------
1) run_pipeline.py reports:

   Pipeline completed successfully.

2) verify_repro.ps1 reports:

   [OK] Reproducibility verification passed.
   [OK] Fresh rebuild succeeded and text outputs are deterministic.

Output Artifacts
----------------
1) Letter matrices:

   out\csv\
   out\json\
   out\excel\

2) Grand summary:

   out\txt\GRAND_SUMMARY.txt

3) Workspace audit:

   out\audit\workspace_registry.csv
   out\audit\workspace_hash_manifest.sha256
   out\audit\workspace_validation_report.txt

Pass Criteria
-------------
A validation run is PASS only if:

1) All discovered normalized sources are processed without error.
2) Every processed source produces CSV, JSON, and XLSX letter matrix outputs.
3) JSON totals match the sum of all 28 letter totals.
4) Repeated clean rebuilds produce identical tracked text outputs.
5) workspace_registry_hash_audit.py reports PASS.

Failure Interpretation
----------------------
1) Missing outputs:
   The audit utility reports MISSING_OUTPUT.

2) JSON total mismatch:
   The audit utility reports JSON_TOTAL_MISMATCH.

3) Reproducibility mismatch:
   verify_repro.ps1 reports changed, added, or missing tracked text outputs.

## Current Status

Phase 2A is frozen.

## Validation Status

PASS

## Reproducibility Status

PASS

A successful reproducibility verification has been completed using verify_repro.ps1.

Canonical production artifacts (CSV, JSON, TXT) were rebuilt from source inputs and produced identical outputs across repeated executions.

XLSX exports are excluded from reproducibility verification because workbook metadata may be rewritten during save operations by the export library. Canonical reproducibility is established through CSV, JSON, and TXT artifacts.

## Final Verification Result

[OK] Reproducibility verification passed.
[OK] Fresh rebuild succeeded, naming contract holds, and canonical production outputs are deterministic.

Phase 2A is approved for publication and inclusion in the website source bundle.
